13 research outputs found

    Evaluation-as-a-service for the computational sciences: overview and outlook

    Get PDF
    Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not fit this paradigm very well: extremely large data sets, confidential data sets as found in the medical domain, and rapidly changing data sets as often encountered in industry. Crowdsourcing has also changed the way in which industry approaches problem-solving with companies now organizing challenges and handing out monetary awards to incentivize people to work on their challenges, particularly in the field of machine learning. This article is based on discussions at a workshop on Evaluation-as-a-Service (EaaS). EaaS is the paradigm of not providing data sets to participants and have them work on the data locally, but keeping the data central and allowing access via Application Programming Interfaces (API), Virtual Machines (VM), or other possibilities to ship executables. The objectives of this article are to summarize and compare the current approaches and consolidate the experiences of these approaches to outline the next steps of EaaS, particularly toward sustainable research infrastructures. The article summarizes several existing approaches to EaaS and analyzes their usage scenarios and also the advantages and disadvantages. The many factors influencing EaaS are summarized, and the environment in terms of motivations for the various stakeholders, from funding agencies to challenge organizers, researchers and participants, to industry interested in supplying real-world problems for which they require solutions. EaaS solves many problems of the current research environment, where data sets are often not accessible to many researchers. Executables of published tools are equally often not available making the reproducibility of results impossible. EaaS, however, creates reusable/citable data sets as well as available executables. Many challenges remain, but such a framework for research can also foster more collaboration between researchers, potentially increasing the speed of obtaining research results

    Results of the Seventh Edition of the BioASQ Challenge

    No full text
    The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research. © 2020, Springer Nature Switzerland AG

    Drug-Drug Interaction Prediction on a Biomedical Literature Knowledge Graph

    No full text
    Knowledge Graphs provide insights from data extracted in various domains. In this paper, we present an approach discovering probable drug-to-drug interactions, through the generation of a Knowledge Graph from disease-specific literature. The Graph is generated using natural language processing and semantic indexing of biomedical publications and open resources. The semantic paths connecting different drugs in the Graph are extracted and aggregated into feature vectors representing drug pairs. A classifier is trained on known interactions, extracted from a manually curated drug database used as a golden standard, and discovers new possible interacting pairs. We evaluate this approach on two use cases, Alzheimer’s Disease and Lung Cancer. Our system is shown to outperform competing graph embedding approaches, while also identifying new drug-drug interactions that are validated retrospectively. © 2020, Springer Nature Switzerland AG

    Splice site recognition using transfer learning

    No full text
    In this work, we consider a transfer learning approach based on K-means for splice site recognition. We use different representations for the sequences, based on n-gram graphs. In addition, a novel representation based on the secondary structure of the sequences is proposed. We evaluate our approach on genomic sequence data from model organisms of varying evolutionary distance. The first obtained results indicate that the proposed representations are promising for the problem of splice site recognition. © 2014 Springer International Publishing

    Towards Open Domain Event Extraction from Twitter: REVEALing Entity Relations

    No full text
    Abstract. In the past years social media services received content contributions from millions of users, making them a fruitful source for data analysis. In this paper we present a novel approach for mining Twitter data in order to extract factual information concerning trending events. Our approach is based on relation extraction between named entities, such as people, organizations and locations. The experiments and the obtained results suggest that relation extraction can help in extracting events in social media, when combined with pre and post-processing steps

    Modeling the off-target effects of CRISPR-Cas9 experiments for the treatment of Duchenne Muscular Dystrophy

    No full text
    Duchenne Muscular Dystrophy (DMD) is a neuromuscular disorder caused by the absence of the dystrophin protein. If left untreated, it causes movement problems at the age of 10-12 years, and death occurs in the 20-30 years due to heart failure. There is currently no cure for this disease, only symptomatic treatment. Genome editing approaches like the CRISPR-Cas9 technology can provide new opportunities to ameliorate the disease by eliminating DMD mutations and restoring dystrophin expression. While it is true that on-target activity can be influenced by the guide specificity, the proposed approach focuses on the devastating results that off-target cleavage can cause (e.g., unexpected mutations). This is why reducing off-target effects is the first priority in guide design. The rapid growth of the Artificial Intelligence field has helped researchers employ artificial feature extraction and Machine Learning approaches to evaluate the potential off-target scores. This work presents our approach in evaluating off-targets of CRISPR-Cas9 gene editing specifically for the DMD disorder, using Machine Learning. We offer a comparison between four regression methods that predict the insertions-deletions (indels) produced based on a pair guide RNA and the equivalent off-target and evaluate the results using the Spearman correlation metric. We propose the most suitable method, a Decision Tree Regressor, for this problem and a comparison of the results with some state-of-art tools. The performance of our tool with Cross Validation is better than the independent performance of the other tools except from Elevation which performed about as good as ours. © 2022 ACM

    Analysis and classification of constrained DNA elements with N-gram graphs and genomic signatures

    No full text
    Most common methods for inquiring genomic sequence composition, are based on the bag-of-words approach and thus largely ignore the original sequence structure or the relative positioning of its constituent oligonucleotides. We here present a novel methodology that takes into account both word representation and relative positioning at various lengths scales in the form of n-gram graphs (NGG). We implemented the NGG approach on short vertebrate and invertebrate constrained genomic sequences of various origins and predicted functionalities and were able to efficiently distinguish DNA sequences belonging to the same species (intra-species classification). As an alternative method, we also applied the Genomic Signatures (GS) approach to the same sequences. To our knowledge, this is the first time that GS are applied on short sequences, rather than whole genomes. Together, the presented results suggest that NGG is an efficient method for classifying sequences, originating from a given genome, according to their function. © 2014 Springer International Publishing

    Overview of BioASQ 2020: The Eighth BioASQ Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

    No full text
    In this paper, we present an overview of the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. To this end, shared tasks are organized yearly since 2012, where different teams develop systems that compete on the same demanding benchmark datasets that represent the real information needs of experts in the biomedical domain. This year, the challenge has been extended with the introduction of a new task on medical semantic indexing in Spanish. In total, 34 teams with more than 100 systems participated in the three tasks of the challenge. As in previous years, the results of the evaluation reveal that the top-performing systems managed to outperform the strong baselines, which suggests that state-of-the-art systems keep pushing the frontier of research through continuous improvements. © 2020, Springer Nature Switzerland AG
    corecore